Goto

Collaborating Authors

 view angle




Viewpoint Generalization via a Binocular Vision

Neural Information Processing Systems

Convolutional Neural Networks (CNNs, LeCun et al. (1989, 1998)) are models inspired by how the Despite giving impressive performance in many applications, CNNs still have a long way to go in terms of being comparable to human's visual ability. The two images from the left eye and the right eye are merged and then fed into regular CNN layers.


dabd8d2ce74e782c65a973ef76fd540b-AuthorFeedback.pdf

Neural Information Processing Systems

Fig. (b) shows the performance of vanilla CNN with Fig. (c) shows that CNN The experimental settings are toy-like. ICLR'18, which considered only grayscale images. The neural network backbone is weak. The results are shown in Fig. (e). We hope our above explanation relieves your concerns, and if so, please consider raising your score.


Adaptive Planning Framework for UAV-Based Surface Inspection in Partially Unknown Indoor Environments

arXiv.org Artificial Intelligence

Inspecting indoor environments such as tunnels, industrial facilities, and construction sites is essential for infrastructure monitoring and maintenance. While manual inspection in these environments is often time-consuming and potentially hazardous, Unmanned Aerial Vehicles (UAVs) can improve efficiency by autonomously handling inspection tasks. Such inspection tasks usually rely on reference maps for coverage planning. However, in industrial applications, only the floor plans are typically available. The unforeseen obstacles not included in the floor plans will result in outdated reference maps and inefficient or unsafe inspection trajectories. In this work, we propose an adaptive inspection framework that integrates global coverage planning with local reactive adaptation to improve the coverage and efficiency of UAV-based inspection in partially unknown indoor environments. Experimental results in structured indoor scenarios demonstrate the effectiveness of the proposed approach in inspection efficiency and achieving high coverage rates with adaptive obstacle handling, highlighting its potential for enhancing the efficiency of indoor facility inspection.


Self-Modeling Robots by Photographing

arXiv.org Artificial Intelligence

Self-modeling enables robots to build task-agnostic models of their morphology and kinematics based on data that can be automatically collected, with minimal human intervention and prior information, thereby enhancing machine intelligence. Recent research has highlighted the potential of data-driven technology in modeling the morphology and kinematics of robots. However, existing self-modeling methods suffer from either low modeling quality or excessive data acquisition costs. Beyond morphology and kinematics, texture is also a crucial component of robots, which is challenging to model and remains unexplored. In this work, a high-quality, texture-aware, and link-level method is proposed for robot self-modeling. We utilize three-dimensional (3D) Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones, whose deformations are controlled by the transformation matrices generated by a kinematic neural network. The 3D Gaussians and kinematic neural network are trained using data pairs composed of joint angles, camera parameters and multi-view images without depth information. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level, and render robot images from different perspectives with the aid of 3D Gaussian splatting. Furthermore, we demonstrate that the established model can be exploited to perform downstream tasks such as motion planning and inverse kinematics.


Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer

arXiv.org Artificial Intelligence

The electromagnetic inverse problem has long been a research hotspot. This study aims to reverse radar view angles in synthetic aperture radar (SAR) images given a target model. Nonetheless, the scarcity of SAR data, combined with the intricate background interference and imaging mechanisms, limit the applications of existing learning-based approaches. To address these challenges, we propose an interactive deep reinforcement learning (DRL) framework, where an electromagnetic simulator named differentiable SAR render (DSR) is embedded to facilitate the interaction between the agent and the environment, simulating a human-like process of angle prediction. Specifically, DSR generates SAR images at arbitrary view angles in real-time. And the differences in sequential and semantic aspects between the view angle-corresponding images are leveraged to construct the state space in DRL, which effectively suppress the complex background interference, enhance the sensitivity to temporal variations, and improve the capability to capture fine-grained information. Additionally, in order to maintain the stability and convergence of our method, a series of reward mechanisms, such as memory difference, smoothing and boundary penalty, are utilized to form the final reward function. Extensive experiments performed on both simulated and real datasets demonstrate the effectiveness and robustness of our proposed method. When utilized in the cross-domain area, the proposed method greatly mitigates inconsistency between simulated and real domains, outperforming reference methods significantly.


Differentiable Uncalibrated Imaging

arXiv.org Artificial Intelligence

We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordinates. We also develop differentiable spline interpolators which perform as well as neural networks, require less time to optimize and have well-understood properties. Differentiability is key as it allows us to jointly fit a measurement representation, optimize over the uncertain measurement coordinates, and perform image reconstruction which in turn ensures consistent calibration. We apply our approach to 2D and 3D computed tomography, and show that it produces improved reconstructions compared to baselines that do not account for the lack of calibration. The flexibility of the proposed framework makes it easy to extend to almost arbitrary imaging problems.


Annual field-scale maps of tall and short crops at the global scale using GEDI and Sentinel-2

arXiv.org Artificial Intelligence

Crop type maps are critical for tracking agricultural land use and estimating crop production. Remote sensing has proven an efficient and reliable tool for creating these maps in regions with abundant ground labels for model training, yet these labels remain difficult to obtain in many regions and years. NASA's Global Ecosystem Dynamics Investigation (GEDI) spaceborne lidar instrument, originally designed for forest monitoring, has shown promise for distinguishing tall and short crops. In the current study, we leverage GEDI to develop wall-to-wall maps of short vs tall crops on a global scale at 10 m resolution for 2019-2021. Specifically, we show that (1) GEDI returns can reliably be classified into tall and short crops after removing shots with extreme view angles or topographic slope, (2) the frequency of tall crops over time can be used to identify months when tall crops are at their peak height, and (3) GEDI shots in these months can then be used to train random forest models that use Sentinel-2 time series to accurately predict short vs. tall crops. Independent reference data from around the world are then used to evaluate these GEDI-S2 maps. We find that GEDI-S2 performed nearly as well as models trained on thousands of local reference training points, with accuracies of at least 87% and often above 90% throughout the Americas, Europe, and East Asia. Systematic underestimation of tall crop area was observed in regions where crops frequently exhibit low biomass, namely Africa and South Asia, and further work is needed in these systems. Although the GEDI-S2 approach only differentiates tall from short crops, in many landscapes this distinction goes a long way toward mapping the main individual crop types. The combination of GEDI and Sentinel-2 thus presents a very promising path towards global crop mapping with minimal reliance on ground data.


Zero Touch Coordinated UAV Network Formation for 360{\deg} Views of a Moving Ground Target in Remote VR Applications

arXiv.org Artificial Intelligence

Unmanned aerial vehicles (UAVs) with on-board cameras are widely used for remote surveillance and video capturing applications. In remote virtual reality (VR) applications, multiple UAVs can be used to capture different partially overlapping angles of the ground target, which can be stitched together to provide 360{\deg} views. This requires coordinated formation of UAVs that is adaptive to movements of the ground target. In this paper, we propose a joint UAV formation and tracking framework to capture 360{\deg} angles of the target. The proposed framework uses a zero touch approach for automated and adaptive reconfiguration of multiple UAVs in a coordinated manner without the need for human intervention. This is suited to both military and civilian applications. Simulation results demonstrate the convergence and configuration of the UAVs with arbitrary initial locations and orientations. The performance has been tested for various number of UAVs and different mobility patterns of the ground target.